Corpus: cat_wikipedia_2007_30K

Other corpora

4.7.3.1 Most Frequent Hash Values For Sentences

Identical Hash Values may result from similar sentences.

Number of distinct hash values
# of distinct signatures # of sentences Ratio
29992 30000 0.9997
Hash value Count
731570722 2
768749748 2
288740083 2
986295512 2
511313655 2
411833945 2
138888397 2
1107367029 2
1868261280 1
2094382603 1
1624236581 1
1460006210 1
124850389 1
1335229636 1
1593952557 1
1847647742 1
1369222554 1
92464101 1
1448536857 1
789810672 1
63 msec needed at 2017-12-01 06:26